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ON THE SILHOUETTE OF BINARY SEARCH TREES 

By Rudolf Grubel 

Leibniz Universitdt Hannover 

A zero-one sequence describes a path through a rooted directed 
binary tree T; it also encodes a real number in [0, 1] . We regard the 
level of the external node of T along the path as a function on the unit 
interval, the silhouette of T. We investigate the asymptotic behavior 
of the resulting stochastic processes for sequences of trees that are 
generated by the binary search tree algorithm. 

1. Introduction. Let (£ n )neN be a sequence of independent random vari- 
ables, where each £ n is uniformly distributed on the unit interval. The binary 
search tree (BST) algorithm sequentially stores these variables in a sequence 
(T n ) ng N of rooted, directed, labeled binary trees. T\ consists of the root node 
only, with label £i. In order to obtain T n+ \ from T n , we compare £ n +i with 
the labels of the nodes along a path through T n , beginning at the root and 
moving to the left if is smaller, to the right if it is greater than the 
label associated with the respective node. Once an empty node has been 
found, we attach it to the tree and £ n +i is the label of the new node (formal 
definitions will be given below). The BST algorithm is one of the basic and 
classical search procedures and is discussed in the standard texts in this 
area; see, for example, Knuth (1973), Mahmoud (1992) and Sedgewick and 
Flajolet (1996). 

Let T n be the set of rooted directed binary trees with n nodes. Then, T n 
is a random variable with values in T n , but the distribution of T n is not 
the uniform distribution on T n . For uniformly distributed plane trees or, 
more generally, simply generated trees, there are various codings, for ex- 
ample, by depth-first search, that relate the trees to random walks (Harris 
correspondence). These codings provide the basis for an in-depth study of 
simply generated trees, leading to limit results that involve Brownian ex- 
cursions, and that, in turn, have applications to certain nonlinear partial 
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differential equations; see Aldous (1991), Chapter 6 in Pitman (2006) and 
Le Gall (1999). 

For the present BST case, we investigate an encoding by a function that 
we call the silhouette X(T) = (X s (T))o< s <i of the tree T. Any s G [0,1] 
defines a path through T by its binary expansion and X S (T) is simply the 
depth (or level) of the external node of T along this path. In contrast to other 
notions such as the profile of the tree [see Chauvin, Drmota and Jabbour- 
Hattab (2001)], this is a coding in the sense that T can be reconstructed 
from X. The notion and use of paths through the tree is of course not new 
and appears in, for example, Pittel (1985, 1986); the label "silhouette" was 
coined in Griibel (2005). Applied to the output sequence T n , n G N, of the 
BST algorithm, the silhouette yields a sequence (A" s (r n )) < s <i, n G N, of 
stochastic processes. Our main result shows that these processes converge 
in a weak sense to a nondegenerate limit process as n — > oo. 

Section 2 contains some formal definitions related to trees. In Section 3, 
we define the silhouette and show that distributional convergence to a non- 
degenerate limit process does not hold with respect to pointwise convergence 
on the underlying function space. The weak convergence essentially refers 
to the integrated silhouette, and a key role for the analysis of the latter 
is played by the discounted external path length, which we discuss in Sec- 
tion 4. In Section 5, we consider the finite-dimensional distributions of the 
integrated silhouette, prove the convergence to a limit process, characterize 
the limit distribution as the unique solution to a fixed point equation on a 
suitable space of measures and study the paths of the limit process. In the 
final section, we collect some comments on related questions and on possible 
variations of our findings. 

Throughout, we write #A for the number of elements of the set A and 
C(X) for the distribution of the random variable X. Sometimes, we write 
X ~ /i instead of C(X) = jjL. A random variable X is stochastically smaller 
than or equal to another random variable Y (both real- valued) , written 
X < v Y, if P(X >x)< P(Y > x) for all x £ EL Finally, "=£>" denotes equal- 
ity in distribution and "— >x>" denotes convergence in distribution. 

2. Some notation for trees. A tree is a graph and thus consists of vertices 
(or nodes) and edges. In the context of binary trees, it is convenient to 
represent (or define) nodes as elements of M, 

oo 

AA:=|J{0,l} fe , 

k=0 

where {0, 1}° := {0}. Stated in different terminology, M is the set of finite 
words over the alphabet that consists of the two letters and 1. By a rooted, 
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directed binary tree, we mean a finite set T of (internal) nodes with the 
following property: 

(1) u=(ui,...,u k )eT, k>l => u := (ui, . . . ,ii fc -i) G T. 

A binary tree can therefore be regarded as a finite and prefix-stable set of 
finite words with letters and 1. Informally, U{ = means that we move to 
the left (to the right, if u% = 1) on the path from the root node to the node 
in question. 

We may interpret u in (1) as the direct ancestor or predecessor of u. The 
root node is represented by the empty string and has no predecessor. The 
edges of the tree are the pairs (u, u) with The size of a tree is simply 

the number of its nodes. T denotes the set of all binary trees. By a labeled 
tree, we mean a pair (T, </>), with T G T and a function <j) : T — ► K; the value 
<j)(u) is the label associated with the node u£T. 

Given a tree T G T, we may now formally define two associated trees, 
L(T) and R(T), the left and right subtree of T, by 

L(T) :={u= (ui,...,u fe ) GAA:(0,ui,...,M fc ) G T}, 

(2) 

R(T) := {u = (ui, • • • , Uk) G AA : (1, tii, . . . , Ufc) G T}. 

Obviously, any nonempty T G T is uniquely determined by the correspond- 
ing subtrees L(T) and R(T) (which may, of course, be empty). For u G N 
and T G T, let T(u) be the subtree of T that consists of u, now regarded as 
the root node, and all descendants of u in T. A formal definition, as in (2), 
is straightforward. Indeed, we may consider L(T) and R(T) as the subtrees 
T(u) associated with the nodes u = (0) and u = (1), respectively. 

A node u = (u\, . . . , Uk) has depth |it| = k. It is an external node of T if u 
itself is not an element of T, but its predecessor is. The formal definition of 
the set dT of external nodes of the tree T G T is 

dT:= {u= (ui, • • • ,Ufc) G M:k> l,u $ T, (iti,.. .,iifc_i) G T}. 

We augment this with the convention that <9To = {0} for the empty tree To. 

3. The silhouette. A sequence u = (iifc)fceN i n {0; 1} N can De regarded as 
the binary expansion s = J2h=i u k%~ k of some number s in the unit interval 
[0, 1]. Conversely, for any s G [0, 1], we have a unique such binary expansion 
if we require that binary rationals s < 1 have = for all k > ko, for 
some ko G N. For any T G T, we now introduce its silhouette as the function 
s i — ^ X s (T) on the unit interval defined by 

X S (T) := min{A: G N : . . . ,u fc ) ^ T}; 

an informal description of the silhouette was given in Section 1. These func- 
tions are piecewise constant on intervals with binary rational endpoints if 
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the length of these intervals is chosen small enough, and they are continuous 
from the right and have left-hand limits; at s = 1 they are left-continuous. 
The left part of Figure 1 shows an example for a tree of size 500. 

We will occasionally write X(T,s) instead of X S (T). A basic fact is the 
following recursion, which, because of the preparations undertaken in the 
previous section, can now be expressed concisely as 

(3) X S (T) = 1 + X 2s (L(T))l [0)1/2) ( S ) + X 2s „ 1 (R(T))l [1/2>1] (s) 

for < s < 1, provided that T / 0. Obviously, X(0) = 0. 

Now, let T n be the random tree generated by the BST algorithm from 
the first n variables in a sequence of independent, uniformly distributed 
random variables, as explained in the Introduction [hereafter, we will simply 
refer to (T n ) nG fq as a BST sequence]. Then, for each n S N, the silhouette 
X(T n ) = (X s (T n ))o< s <i can be regarded as a stochastic process with time 
parameter ranging over the unit interval. 

We first consider the finite-dimensional distributions of the silhouette pro- 
cesses. It is well known [see, e.g., Regnier (1989)] that the present combi- 
nation of input and algorithm leads to the following stochastic dynamics 
of the tree sequence: T n+ \ is obtained from T n by picking one of the n + 1 
external nodes of T n uniformly at random and then adding this node to T n . 
As a consequence, the sequence (X s (T n )) nG ^ of depths of the external nodes 
along the path s can be represented as the sequence of partial sums of an 
independent sequence (/ n ) n6 N of indicator variables, with P(I n = 1) = 1/n 
for all n 6 N. Extending this observation to more than one path provides 
the basis for our first result. 

Theorem 1. Let s±, . . . , Sd, with < s\ < ■ • ■ < < 1, be given. The d- 
dimensional random vectors Y n = (Yn,i-> •••■> Yn,d) with 

X(T n ,si) - logn 

Y nl := == for 1 = 1,..., d 

Vlogra 
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then converge in distribution as n — > oo to a d- dimensional standard normal 
limit. 

Proof. Given s\, . . . , Sd, we define a sequence (Z n ) n ^ of d-dimensional 
random vectors Z n = (Z n>1 , Z n4 ) by Z n j = X(T n , s t ) - X(T n _i, sj). As 
we add one node at a time, these all take values in the set consisting of 
the zero vector and the standard basis vectors ej, j = 1, . . . ,d, where the 
fcth component of ej is 1 if k = j and otherwise. Let (Z n ) n ^ be another 
such sequence where, now, Z n = (Z ni i, ■ ■ ■ , Z n> d), n£R, are independent, 
with P(Z n = e t ) = 1/n for l = l,...,d and P(Z n = (0, 0, . . . , 0)) = 1 - l/n 
if n > d. For the "tilded" variables, we have asymptotic normality by the 
multivariate version of the Lindeberg-Feller central limit theorem (or the 
usual one-dimensional version, together with the Cramer-Wold device). The 
difference between the standardized partial sum processes associated with 
the Z- and the Z- variables is asymptotically negligible as the time that the 
last of the pairwise last common ancestors is reached is finite with probability 



Hence, if we consider the silhouette itself, the appropriate scaling would 
lead to independent components in the limit. This shows that in order to 
obtain a nondegenerate limit process for the silhouette sequence (X(T n )) n ^ 
associated with a BST sequence (T n ) ng N, we need to weaken the notion of 
convergence. A standard strategy is to regard the paths of the process X(T n ) 
as "weak" functions in the sense of linear forms on some function space J-, 
that is, to investigate the random linear functionals / 1— > J X t (T n )f(t) dt, 
f £ T . A key role is played by / = 1, a case that we study in the next section. 

4. The discounted external path length. For a binary tree T, let 



be the number of external nodes of T with depth k, k G No- We then have 



which means that we can regard the integral of the silhouette over the whole 
unit interval as a discounted external path length of the tree. 

Now, let (T n ) nS N be a BST sequence, as explained in the previous section; 
we abbreviate i](T n ) to %. For the proof of Theorem 1, we have used the 
dynamical view of this sequence, obtaining T n+ \ from T n by inclusion of 
a randomly chosen element of dT n . For the analysis of the BST sequence, 
the recursive view is equally important: for n > 1, the subtrees L(T n ) and 
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R(T n ) are conditionally independent given I n := #L(T n ), I n is uniformly dis- 
tributed on {0, ... ,n - 1} and, on I n = k, L(T n ) = v T k , R(T n ) = v T n _i_ fe . 
This is a consequence of the fact that the subsequences of the input se- 
quence (£n)neN to the BST algorithm that consist of the values smaller than 
and greater than £1 are independent conditionally on £1 and uniformly dis- 
tributed on the intervals (0,£i) and (£i,l), respectively. Hence, we obtain 
from (3) (or directly) that, for n > 1, 

( 4 ) Vn =v 1 + l(r) In + iln-i-lj, 

with (r]m)m& , (lm)meN and /„ independent, r]' m = v r/ m for all vn £ N and 
I n ~ unif{0, 1, . . . ,n- 1}. Clearly, r) = 0. Let H{n) = H n = J2k=i 1/k be the 
nth harmonic number. For a n := Erj n , we have ao = and (4) implies that 

^ n— 1 

a n = 1 H — a m for all n E N, 
n f— ' 

m=0 

which easily leads to 

(5) E Vn = H n . 

The undiscounted path length Sj^Li kUf.(T n ) plays a key role in the anal- 
ysis of Quicksort and we now use the techniques that were successful in that 
situation: martingales [see Regnier (1989)] and the contraction method [see 
Rosier (1991)]. The filtration (J r n )n&N of interest in the former context will 
be the one generated by the sequence (T n ) ng pj. We write I? for the set of 
square integrable random variables. 



Theorem 2. As n — > oo, % — H n converges almost surely and in quadratic 
mean to a random variable n^. Within the set of distributions with finite 
second moment and zero mean, the distribution of n^ is characterized by 
the fixed point equation 

(6) Voo =V \{r]oo +V'oo) +Coo, 

where n^ , rj'^ and are independent, rjoo =t> rfoo and 

(7) Coo:=l + |(log(0+log(l-£)), 
with £ uniformly distributed on the unit interval. 

Proof. The transition from T n to T n+ \ means that an external node of 
(random) level K becomes an internal node. This entails a loss of K2~ K , but, 
as the new internal node spawns two external nodes at level K + 1, there is 
also a gain of 2{K + 1)2~ _1 for the discounted external path length, hence 

r] n+1 - Vn = 2(K + l)2- R - 1 - K2~ K = 2- K . 
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By the stochastic dynamics of the tree sequence described in Section 3, we 
have that, given T n with the associated values U)-(T n ) for the number of 
external nodes at level k, 

P[K = k\T n ] = -^—U k {T n ) for all k £ N . 
n + 1 

Hence, with T n as defined above, 

1 00 1 
(8) E[ Vn+1 - 7fe|.F n ] = — E U k (T n )2~ k = ——. 

k=0 

[The last equality uses the well-known fact that YZ=o^~ k U k {T) = 1 for all 
binary trees T. Note that 2~ k Uh(T n ) is the Lebesgue measure of X{T n )~ l {{k}), 
so this fact has a simple interpretation in terms of the silhouette.] From (8), 
we immediately obtain that (% — H n , J r n ) n gM is a zero mean martingale; (8) 
also provides an alternative proof for (5). The individual random variables 
are all bounded and hence elements of 1? . 

We next show that the martingale is bounded in L 2 . Let a\ := E(rj n — 
Hn) 2 =var(?7 n ). From (4), we obtain 

°"re = i var fe„+'/Ll-7j' 

Because of 

2 n ~ * 

£(var[r? /n + Vn-l-lJ 1 ^) = E {°l n + °"n-l-/„) = ~ J2 a ™ 

n m =o 

and 

var(E[r] In + ??Ul-/„l J n]) = var (#/„ + fln-l-lj, 

the conditional variance formula leads to 
2 re— 1 

^^^E* 7 ^ I & «> With 6 « := V&T ( H In + H n-1-I n )- 

m=0 

neNo is a submartingale, we have that m i — ► is non- 
decreasing, so the sum may be bounded from above by a 2 _i/2. To obtain 
boundedness of the sequence (<7^)neNo> h is therefore enough to show that 
the sequence (6 n )neN is bounded. This, in turn, will follow from the bound- 
edness of {E(Hi n — H n ) 2 ) n <z^ if we use Minkowski's inequality and the fact 
that I n and n — 1 — I n have the same distribution. From the elementary 
inequalities 

log m < H m < log m + 1 for all m £ N, 



we obtain 



log — - 1 < H In - H n < log^ + 1 
n n 
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on {I n > 0}. Hence, the required boundedness will be implied by 

supE(l {In>0} log^f) <oo, 
which finally follows from 

m=l 

Because of its boundedness in L 2 , the martingale (% — H n ) n£ ^ converges 
to some limit Tfco almost surely and in I? as n — > oo. 

Next, we derive the fixed point equation for the distribution of r/oo- We 
first note that (4) implies that for n > 1, 

(9) r] n - H n = v \(j] In - H In ) + |(^_ 1 _ /n - i? n -i-lj + Cn 

with 

Cn := 1 + + #n-l-/„) - #n- 

We may assume that I n = [n^\ with £ ~ unif(0,l). With the standard 
asymptotic result for harmonic numbers, H n = log(n) + 7 + o(l), we then 
obtain 

Cn = l + -(log(Lnej)-log(n)) + -(log(n-l- LnCj)-log(n)) + o(l) 

= 1 + 2 1 °< i f) + 2 l0 < 1 - ii ^) +0 ( 1 » 

-^l + ilog({) + ilog(l-{). 

Using r] n — H n —>x> f]oo and the distributional assumptions on (r/ n ) ne N, (?? n )neN 
and I n , we now obtain (6) by letting n — > 00 in (9). 

Finally, the right-hand side of (6) defines an operator ^ on the space M.2,0 
of distributions \x with mean zero and finite second moment. A straightfor- 
ward calculation shows that 

dl(*(/ii),*(// 2 )) < |d|(>i,/^) for all |Ui,/i2 € -M 2 ,o, 

where (f 2 denotes the usual Wasserstein 2-distance. Hence, ^ is a contraction 
and the distribution of r]^ is characterized in A^2,o by (6)- □ 

Note that with £ as in the theorem, — log(£) and — log(l — £) both have an 
exponential distribution with mean 1 (of course, they are not independent); 
in particular, Coo has finite moments of all orders, -ECoo = and var(Coo) = 
1 — 7r 2 /12 w 0.1775. The fact that the distribution of Coo is nondegenerate 
implies the same for the distribution of 7700 . 
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It is known that in order to obtain a nondegenerate distributional limit 
for the undiscounted external path length fj n := 2~2T=i kU Ut k, we need to shift 
and rescale: (fj n — 2nlogn)/n — >x> fjoc, where fjoc is a real, nonconstant ran- 
dom variable [see Regnier (1989) and Rosier (1991)]. It may seem surprising 
that for the discounted version, it is enough to shift. Roughly, because of the 
asymptotic independence of the marginal distributions of the silhouette, the 
"thin spikes" and the "broad valleys" cancel out to a certain extent; see Fig- 
ure 1. Note that %/(n + 1) and r] n can be regarded as the mean associated 
with the random distribution fi n and fi n , respectively, where Ji n has density 
k i ^ 2 k /(n + 1) with respect to fi n . In fact, if we regard fj n /(n + 1) as the 
basic variables, then, again, it is enough to shift to obtain a nondegenerate 
limit distribution. 

Despite the close connection between jl n and fi n , the fixed point equations 
for the respective distributional limits are quite different in the two cases. 
Indeed, in the undiscounted case, Rosier (1991) obtained the equation 

(10) ^o=c£^o + + 

where £ is uniformly distributed on the unit interval, f/oo, fj'^ and £ are 
independent, =%> fj 1 ^ and 

(11) C(x) = 1 + 2(xlog(a?) + (1 - x) log(l - x)). 

A first major difference between (10) and (6) is the fact that in the latter 
case, the linear combination on the right-hand side of the independent copies 
of the left-hand side has the deterministic coefficients 1/2 and 1/2 instead 
of £ and 1 — £. As we will see in the next result, this makes it possible to 
obtain a simple and explicit representation for r/oo, which is lacking in the 
undiscounted case. We mention, in passing, that the distribution of fjoo has 
been the subject of considerable attention; see, for example, Cramer (1996), 
Devroye, Fill and Neininger (2000) and Fill and Janson (2000). A second 
important difference is the fact that the function C in (11) is bounded, 
whereas the corresponding function of £ in (7) is only bounded from above. 
This has an important consequence for the tail behavior (finiteness domain 
of the moment generating function) of the respective solutions. 

Theorem 3. Let {C n ,k ■ n G N , k G {1, . . . , 2™}} be a family of indepen- 
dent random variables, all with the same distribution as Coo, where Coo "is 
given in (7). Then, with r/oo as in Theorem 2, 

oo 2™ 

(12) Voo=vY, 2 ~ n Y,<n,k- 

n=0 k=l 

Further, the moment generating function M(t) = £'exp(t?7 00 ) forrj^ is finite 
for all t > —2. Finally, with r) n as in Theorem 2, we have that 

(13) M n (t):=Eexp{t(r] n -H n ))<M(t) for all t> -2. 
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Proof. It is easily checked that the sequence (r/oo,n)neN of partial sums, 

n 2 m 

T]co,n := ^ 2 ^ Cm,ki 
■m=0 fe=l 

is a zero mean martingale that is bounded in L 2 and hence converges almost 
surely and in L 2 to a random variable 7700,00 • It is equally easy to check 
that ?7oo,oo solves (6), hence (12) follows with the uniqueness statement in 
Theorem 2. 

We know from the proof of Theorem 2 that {r] n — H n ) ne ^ is a martingale 
that converges in L 2 to r/oo- This convergence implies the representation 

(14) r] n -H n = £[7/00 I J= n \ for all n G N, 

where (JvOneN denotes the natural filtration associated with (rj n — H n ) nG ^. 
We now know that 7700 and ?7oc,oo have the same moment generating function, 
hence (13) follows from (14) on using Jensen's inequality for conditional 
expectations. 

It remains to prove the finiteness of M(t) for t> —2. The moment gener- 
ating function for £00 can be given explicitly as 

KJ Jo v 7 r(2 + t) ' 

where, of course, t/2 > —1 is required for the integral to be finite. From (12), 
we obtain 

00 

M(t)= n^o(2" n t) 2 ". 

n=0 

As E^oo = and hence Mg(0) = 0, it is straightforward to show that the 
product converges for t> —2. □ 

In the undiscounted case, the moment generating function exists on the 
whole real line. The inequality (13) will be used to obtain a uniform tail 
bound for the variables r\ n — H n that is needed in the next section. 

5. The integrated silhouette. We now return to the functional point of 
view explained at the end of Section 3, taking for T the set of indicator 
functions l[o,t]> < t < 1. This leads us to consider the integrated silhouette 
Y(T) = (Vf (T'))o<t<i associated with a binary tree T, where 

Y t (T) := / X S {T) ds for all t G [0, 1]. 
Jo 

We define the normalized version Y°(T) = (Y t °(T))o<t<i by 
Y°(T) := Y t (T) - tYi(T) for all t G [0, 1]. 
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This "ties down" the original process, in the sense that Y °(T) = Y^{T) = 0. 
Also, Y\(T) = rj(T), the discounted external path length discussed in the 
previous section. We also use rf(T) := r](T) — H(=ffT) to denote the centered 
external path length. 

Suppose, now, that (T n ) ng pj is a BST sequence. Our aim is a functional 
limit theorem for the resulting sequence (V(T n )) ng pj of stochastic processes. 
Instead of Y(T n ), we will consider the pairs 

(15) Z n :=^ D g ) ) ), 

which we regard as random quantities with values in the linear space 

(16) 5:= Coo [0,1] xR, 

where Coo[0, 1] denotes the set of all continuous functions / : [0, 1] — > R that 
have /(0) = /(l) = 0. We can obviously recover Y(T n ) from Z n (and H n ). 
Together with 

\\z\\ := H/lloo + \a\ for all z = (f,a) G S, 

the linear space S becomes a separable Banach space. Here, ||/||oo := 
sup 0<t<1 |/(i)| denotes the supremum norm. We will show that Z n converges 
in distribution as n — > oo, where the convergence refers to the topological 
structure induced by this norm. For this, we follow the classical route, as 
laid out in Billingsley (1968), considering first the finite-dimensional distri- 
butions and then proving tightness, but the actual details need to be adapted 
to the present setup. 

In connection with the finite-dimensional distributions, instead of con- 
sidering the Y°(T n ) part of Z n at arbitrary arguments to, .. .,td G [0, 1], we 
restrict ourselves to complete sets of binary rationals of the same depth, 
that is, we take d = 2 k and tj = j2~ k for j = 0, . . . , 2 k . A standard argument 
using the continuity of the paths of Y°(T n ) shows that weak convergence 
of the resulting random vectors, for all k G N, is enough to characterize the 
limit distribution. In fact, in order to simplify the description of the limit- 
ing finite-dimensional distributions, we will not consider the values Y t °(T n ) 
themselves, but rather the differences. Again, in the present situation, this 
suffices as Y °(T n ) = 0. 

Formally, for k G N, let be the operator that maps a function / : [0, 1] — > 
M to a vector A^f = ((Afc/)j)^ =1 of dimension 2 k , with 

(A*/),- := f(j2~ k ) - f((j - l)2~ k ) for j = 1, ... , 2 k . 

The following theorem shows that these increments converge in distribution 
and also gives a description of the limits. For the statement of the result, we 
need some more notation. Let u{k,j) = (u\(k,j), . . . ,Uk(k,j)) G {0, l} k , j = 
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1, . . . , 2 k , be the nodes of depth k € N in the order of their associated binary 
rationals [the components can be given explicitly as u m (k,j) = [2~ k+m (j — 
1) J (mod 2)] and let 

u{k,j,l) = (ui{k,j),...,ui(k,j)) 

be the ancestor of u(k,j) at depth /, I = 0, . . . , k— 1, with the understanding 
that u(k,j,0) = 0. We also write 1& for the 2 fc -dimensional vector that has 
all entries equal to 1. 

Theorem 4. For k £ N fixed and with n — > oo, 

A fc y°(T n )\ /2-*(A;l fc + /yfc + 7/ fc -r&) 



(17) 



oo 



//ere, pfc = (p^i, ■ ■ ■ ,Pk2 k ) an d 7 lk = • • ■ > % 2 fe ) are independent 2 k -dimen- 
sional random vectors, given (distributionally) as follows. The components 
Vk,j °fVk are independent and each n^j has the same distribution as n^, de- 
fined in (6). Further, the components pkj of pk have the joint distributional 
representation 

(18) pk,j=v lo g&»(fc,j,J-i)+ lo E(^-L(k,j,i-i)), 

{l: Ul (k,j) = l} {l: Ul (k,j)=0} 

where £ u , u &J\f, is a family of independent random variables, all uniformly 
distributed on the unit interval. Finally, 



2 

V°oc = 2"* ,3 +Vk,j)- 
3=1 

Proof. Let T k,J := T(u(k,j)) denote the subtree of T with root u(k,j). 
We have, provided that the fill level of T is at least k so that none of the 
subtrees is empty, 



Y^ k (T)-Y^ 1)2 MT) 

I' 2 X s (T)ds-2~ k r ] (T) 
J(j-i)2- k 



(19) 

= 2~ k (k + r ] (T k >i)-r ] (T)) 

= 2- k (k + H(#T k 'J) - H(#T) + rf{T k >i) - rf(T)) 

for j = 1, ... ,2 . Further, these differences sum to zero in view of Y°(T) 
y °(T), hence, 

(20) rf{T) = 2~ k J2(k + r]°(T k ' j ) + H(#T k ' j ) - H{#T)). 

3=1 
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The development thus far has been for a fixed tree T € T. We now substitute 
the elements T n of a BST sequence for T. Let N n ^ := {N n ^,x, . . . ,N n k 2 k) 
be the random vector that counts the size of the subtrees of T n at level k, 
that is, 

N njk>j = #T*>i, j = l,...,2 k . 

Because of (19) and (20), in order to obtain the distributional limits in (17), 
it is enough to work out the asymptotic behavior of the vector 

(21) (v (T.t l ), . . ., V °(Ty k ),H(N n ^) - H(n), H{N nA2 u) - H(n)) 

as n — > oo (the quantities of interest can be written as a fixed linear function 
of these vectors, so the continuous mapping theorem applies). 

Given that a node u at level I has #T n (u) = m, there are |_ m £«J nodes 
in the left subtree and m — 1 — L m £?J nodes in the right subtree of T n (u), 
independent of what happened at levels 0, . . . , I — 1. Hence, 

—N n u — >x> Vfc = (Vfc i, . . . , V k 2 k), 
n 

where 

1=1 

jointly in j = 1, ... , 2 k . Further, given N n ^j = mj, the trees are inde- 
pendent with 

J- n — T> ± rrij 1 J — J- , • • • > ^ ; 

so that the familiar asymptotics of harmonic numbers, together with Theo- 
rem 2, imply that the random vector in (21) converges in distribution to 

(Vk,l,- ••,%,2 fe >Pfc,l,---,Pfc,2 fc ) 

as n — > oo, with TjkjiPkj as m the statement of the theorem. The theorem 
now follows by appropriately combining elements. □ 

Hence, in contrast to the silhouette itself, where the individual random 
variables are asymptotically independent, we now have limiting finite-di- 
mensional distributions that might be compatible with a limit process that 
has somewhat regular (e.g., continuous) paths. Below, we will see that the 
representation of the finite-dimensional distributions given in (18) provides 
the key to the proof of path properties of the limit process. 

To obtain convergence in distribution, we need tightness of the sequence 
(Z n ) ng N- For this, we require a technical detail that we state separately as 
a lemma. We say that a family (Xj)j g / of nonnegative random variables 
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has uniformly exponentially decreasing tails if, for some constants k > and 
C < oo, 

(22) P(Xi >x) <Cexp(-Kx) for all x > and i € I. 

Some obvious properties of this notion, such as stability with respect to 
taking sums, will be used below without further comment. 

LEMMA 5. Let (X n ) ne ^ and (A n ) n£ fq be sequences of nonnegative ran- 
dom variables with Xq = and, for all n£N, 

(23) X n < v j= max{X Jn , X^J + A n , 

where (X n ) neNo ,(X' n ) neNa ,I n are independent, (X' n ) n£No = v (X n ) n&0 and 
I n ~ unif{0, . . . ,n — 1} . In this situation, if (A n ) n ^ has uniformly expo- 
nentially decreasing tails, then so does (X n ) n( zfq . 

Proof. Suppose that, for some k > and C < oo, 

P(A n > x) < Cexp(-Kx) for all x > 0, n £ N. 

We have to show that there are finite constants k > and C < oo such that 

P(X n >x)< Cexp(-kx) for all x > 0,n G N . 

If we want to prove this by induction, then the case n = is clear as Xq = 0. 
We may also assume that x > xq := log(C)/k as otherwise the upper bound 
is greater than 1. Using (23), we obtain 

P(X n > X ) < 2 tJ:Z _p(x t > ^) + P(A. > f ) , 

so the induction step will work if R and C can be chosen such that 

— J ^ ex P^ — — Cexp(— kx) for all x > xq. 

This can obviously be done if we first choose k := k/8, for example, and then 
choose C large enough. □ 

We next translate the basic recursion (3) for the raw silhouette into a 
recursion for the sequence (Z n ) n£ ^ of processes defined in (15). For this, we 
require the two linear operators A, B : Coo[0, 1] — ► Coo[0, 1] given by 

Af(t):=\f{2tM), Bf(t):=lf((2t-1)+) for all t G [0, 1] 

and the function <ft : [0, 1] — ► M, 

4>(t) := \{t A (1 -t)), 0<t<l. 
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Lemma 6. For all raeN, 

(AY°(T In ) + BY°{T> n _^ In ) + (77°(T 7 J - r / °(^_ 1 „ / J)0\ 



Y°(T n ) 

V°(Tn) 



+ {H{I n )-H{n-l-I n ))<j> 
l + Uv°(Ti n ) + V°(Tn-i-i n )) 

+ \(H(I n ) + H{n-l-I n ))-H n 

where (T„) neNo , (Z£)„eN and I n are independent, (T„) neNo = v (T^) neNo 
and I n ~ unif ({0, . . . , n — 1}) . 

Proof. For a fixed nonempty tree T, the basic recurrence (3) gives 
Y t (T) = t + \Y 2tM {L{T)) + iy (2t „ 1)+ (R(T)) 
and a straightforward calculation results in 

Y°(T) = {AY°(L(T))) t + (BY°(R(T))) t + (rj(L(T)) - r/(i?(T)))0(t). 
Similarly, 

r,°(T) = 1 + Uv(HT)) + rj(R(T))) - H{#T). 

From these, the statement follows on using the distributionally recursive 
structure of BST sequences explained at the beginning of Section 4. □ 

We now introduce the space Ai of probability measures ^ on the Borel 
subsets of the space S defined in (16) that satisfy the conditions 

/ + x 2 )Kdf, dx) < oo and / X fJ,(df, dx) = 0. 

On M, we define a metric d by 

d(ji, v? := inf{max{^||y - Yf^E^ - f)) 2 } : (Y, V ) ~ M , (F, 77) ~ v). 

The factor 7 will be useful in the proof of Lemma 7 below. As at the end of 
the proof of Theorem 2, we now construct a (nonlinear) operator : A4 — > M 
whose definition is motivated by passing to the limit in the recursion given 
in Lemma 6: for let *&(fi) be the joint distribution of the random 

function 

AY + BY' + (rj- r)')<j> + (log£ - log(l - £))<f> 
and the real random variable 

i+^(»/+»/)+i(ioge+iog(i-o), 

where (Y,rj), (Y',r]') and £ are independent, (Y,rj) ~ fx, (Y',r)') ~ n and 
£ ~ unif(0, 1). It is easy to check that ^ indeed maps .M into A4. 

Lemma 7. $ is a strong contraction on (A4,d). 
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Proof. Let f/, and v be elements of M. For any given e > 0, we can find 
(Y, rj) ~ fx and (Y , fj) ~ v such that 

max{£||y - F ||^, 7£fa - fj) 2 } < d{n, vf + e. 

Now, let (Y' ,r]' ,Y' ,fj') be an independent copy of (Y,rj,Y ,fj) and let £~ 
unif(0, 1) be independent of the two random quantities (Y, r], Y, fj) and (Y', rj , 
Y',fj'). By the definition of the operator ^ and the metric d, 

d(*fa),*fa)) 2 

< m a x{E\\A(Y -Y) + B(Y' - Y 1 ) + (fa -fj)- fa' - fj'Mt, 

For the second component, we use independence of rj — fj and rj' — fj' and the 
fact that both have the same distribution to obtain 

7 • £(±(fa -fj) + fa' - fj'))) 2 = 7 2-E(v~ fj) 2 < vf + e). 

The starting point for a similar analysis of the more complicated first part 
is the observation that Af vanishes on 1] and that Bf vanishes on [0, h] 
for all / G Coo[0, 1] so that, splitting the supremum accordingly, 

\\A(Y -Y)+ B(Y' - Y>) + (fa -fj)- fa' - fj'ml 

< msx{\\A(Y -Y) + (fa -fj)- fa' - fj'ML, 

||5(y'-y') + (fa-^)-fa'-f?'))^}. 

The sum of the two terms provides an upper bound for the maximum, hence, 

E\\A(Y -Y) + B(Y> - Y') + (fa -fj)- fa' - fj'))4>\\l 

KEWAiY-^ + drj-^-irj'-fj'ml 

+ E\\B(Y -Y) + (fa -fj)- fa' - fj'ML- 

The two terms on the right-hand side have the same structure. Using the 
triangle inequality for the supremum norm, ||</>||oo < 2; || Aflloo < o II f I 1 00 anc ^ 
Minkowski's inequality, we obtain 

E\\A(Y-Y) + (( V -fj)-( V '-fj')) < P\\ 2 00 

<E{\\\Y -Y'U + W^-t,)-^ 1 -rj')\? 

< (i(^||y _ y> 11^)1/2 + i ( ^ ((7? _ ^ _ (r/ _ V 2 ) 2 . 

Inside the large outer brackets, we now use 

E\Y-Y'f^ <d(n,v) 2 +e 
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and 

E((V - V) - (rf - f)')f = 2E(ri ~ f)f < v? + <0, 

which, combined, lead to the upper bound 

EWAtY-^ + dri-fD-iri'-fj'Mt 

< (\(d(^ vf + e) 1 ' 2 + jggidQi, vf + e) 1/2 f 

< c(d(n,u) 2 + e) 

with some constant c < 1/2. Using the same arguments with the terms in- 
volving the operator B, we arrive at 

Since c does not depend on e, we now obtain the strong contraction property 
on letting e tend to 0. □ 

Theorem 8. With Z n , ii£N, and ^ as above, Z n — >x> Z^ as n — > oo, 
where the distribution of Z M is the unique fixed point of the operator ^ . 

Proof. Let 

W n (8):= sup \Y t °(T n )-Y°(T n )\, 5>0, 

0<s,t<l 
\s-t\<S 

be the modulus of continuity of the process Y°{T n ). Using \4>{s) — 4>{t)\ < 
\s — t\/2 and Lemma 6, we obtain 

W n {8) < v ±max{W Jn (2<5), W^i_i ft (2*)} 

(24) 

+ - 2 \v°(Ti n ) + n°{Tn-i-l n )\ + l\H(I n ) -H(n-1- I n )\, 

where (W n ) n6N , (W£) neN and I n are independent, (W n ) n£N = v (W' n ) n& $ 
and I n ~ unif({0, . . . ,n — 1}). Now, let 

W n := sup r 1 / 2 TU n (5). 

0<<K1 

Clearly, Wq = 0, and (24) implies that 

W n <v ^ max{ W}„ , W' n _ x _ In } + A n + B n for all iieN 

with 

A n := \W{T In ) + r,°(T n _^ In )\, B n := \H{I n ) -H(n-l- I n )\ 

and the usual distributional assumptions. Equation (13) in Theorem 3 im- 
plies that (y4 n ) ng N has uniformly exponentially decreasing tails. To ob- 
tain the same property for B n , we first observe that it is enough to treat 
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H(n) — H(I n ), which is nonnegative. It is easy to check that H(n)l^j n= oy 
has uniformly exponentially decreasing tails. Further, we have 



with £ ~ unif(0, 1), so this term also has the required tail property. Therefore, 
Lemma 5 can be applied, leading to a uniform upper bound for the tails of 
W n . In particular, 



which shows that the sequence (Y°(T n )) ne ^ is tight in Coo[0, 1]; see Section 8 
in Billingsley (1968). In view of Theorem 2, (rj (T n )) n& ^ is also tight in 
Coo[0, 1], which, by a standard argument using Prohorov's theorem, implies 
tightness of the sequence (Z n ) n( z?q. Convergence of the finite-dimensional 
distribution was obtained in Theorem 4. Combining elements, we see that 
Z n Zoo for some S- valued random quantity Finally, we can pass 
to the limit in the distributional equation given in Lemma 6 and then use 
Lemma 7, together with Banach's fixed point theorem to complete the proof. 



What can be said about the paths of the process part of the limit 
Zqo? The functional limit theorem implies that the paths are continuous as 
everything happens in C([0, 1]). The maximum of the raw silhouette is the 
height H n of the tree. It has been shown by Devroye (1986) that H n /\ogn 
converges to c+ = 4.311. . . in probability as n — > oo. Similarly, the minimum 
is the tree's fill (or saturation) level L n and L n /log„ converges to c_ = 
0.373. . . almost surely as n — > oo, a result obtained by Biggins (1997) in 
the context of branching random walks. Heuristically, as c_ < c+, we would 
therefore expect that the paths of the limit process are not differentiable. As 
the next result shows, almost all paths of Y^ are indeed not even Lipschitz, 
but they are Holder continuous of order a for all a < 1. 

Theorem 9. With = (Y^X) as * n Theorem 8, 



(H(n) - H(I n ))l {Inm < v - log® + 1 



limsup W n (S) = 

<5J.O neN 



in probability, 



□ 




and, for all a < 1 




(25) 



PROOF. Taking s = and t = 2 k 



in (17), we obtain 



V \k + p k + rj\ 



t-s 
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where — pk has a gamma distribution with shape parameter k and scale 
parameter 1, and r\ is random variable whose distribution does not depend 
on k. For k large, (k + pk) j\fk is close in distribution to a standard normal, 
which shows that the right-hand side of (25) tends to oo in probability as 
h — > oo. This proves the first statement. 

For proof of the second part, we use the Kolmogorov-Chentsov theorem; 
see, for example, Kallenberg (1997), Theorem 2.23. Let pk and r\ be as above; 
note that — pk can be written as the sum of k independent random variables 
that are exponentially distributed with parameter 1. Moments of all orders 
exist for these and for £. Hence, by Minkowski's inequality, for each I £ N, 
there is a constant C\ such that 

E\k + p k + T]\ l <C t k l for all k G N. 

If s and t are such that s = j2~ k and t = (j + l)2 _fc for some j G {0, . . . , 2 k — 
1}, then we obtain, again using (17), 

(26) £|Yoo(f) " ^oo(s)|' < \t - s\ l \log 2 \t - s\\ l C h 

The desired Holder continuity now follows on using two facts: first, the 
right-hand side of (26) can be bounded by \t — s\ l+s for all 5 > and we 
may choose an arbitrarily large value for I; second, the chaining proof of the 
Holder part of the Kolmogorov-Chentsov theorem makes use of the moment 
bounds only for values of s and t that are of the above form, that is, binary 
rational neighbors. □ 

6. Remarks. In this final section, we briefly discuss another family of 
search trees, comment on the methodology and close with a final remark on 
the "big picture." 

(i) As in the previous sections, let (£n)neN be a sequence of independent 
random variables, all uniformly distributed on the unit interval. The DST 
(digital search tree) algorithm uses the binary expansion of the values as a 
directive of how to travel through the binary tree, storing each value in the 
first free (i.e., the unique external) node; again, we refer to Knuth (1973), 
Mahmoud (1992) and Sedgewick and Flajolet (1996) for more information. 
As in the BST case, the algorithm produces a sequence (T n ) ng M of random 
trees, where T n is the DST output for the first n variables £1, ■ • ■ ,£n of the 
sequence. In contrast to the BST situation, we no longer have invariance of 
the resulting random structures under strictly monotone transformations of 
the input data. However, we still have a simple "stochastic dynamics": in 
both cases, T n+ \ is obtained by adding a randomly selected element of dT n 
to T n , but, whereas in the BST case, one of the n + 1 external nodes of T n is 
chosen uniformly, in the DST case, it is chosen with probability 2 _fc , where 
k is the height of the external node (the fact that these values sum to 1 has 
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already been mentioned in the proof of Theorem 2). The silhouette of such 
a tree, in raw and in normalized integrated form, is shown in Figure 2. It is 
"visually obvious" when comparing this with Figure 1 that these functions 
are quite different in the BST and DST cases (Figures 1 and 2 are based on 
the same input and use the same scale). 

A first point of interest is the fact that the associated discounted external 
path length r] n now has a direct algorithmic interpretation as the conditional 
expectation of the number of bit checks necessary to insert the next data 
value £ n +i into the tree T n , given T n . Generally, the DST output is closer to 
the "ideal" search tree of minimal height, but makes stronger assumptions on 
the nature of the input. One indication of this is the fact that distributional 
fluctuations appear in the DST situation; indeed, if we always stored the 
next item in an external node of minimal height, we would have 

%,o P t - log 2 n = ^({log 2 n}), 

where {x} denotes the fractional part of x and 4>{x) := 2 X — 1 — x, < x < 1. 
Some heuristic arguments support the conjecture that the expectation of 
rj n differs from r/ nj0pt by an asymptotically negligible amount and that the 
variance of r] n tends to as n — > oo. 

As a second point of interest, we note that the analysis of the associated 
silhouette processes begins to bifurcate at the earliest possible point, that is, 
in the situation considered in Theorem 1. As a result of the stochastic dy- 
namics explained above, the movement along a particular path s 6 [0, 1), that 
is, the behaviour of X s (T n ) for n = 1, 2, . . . , is a Markov chain of pure-birth 
type with state space N and birth rates Pk,k+i = 2~ fe . The associated distri- 
butions converge along suitably chosen subsequences if we simply subtract 
logn; see Dennert and Griibel (2007) for a recent probabilistic approach. 
Recall that in the BST case, the variance of X s (T n ) grows at a logarithmic 
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Fig. 2. An example of a silhouette (left) and the corresponding normalized integrated 
silhouette (right) in the DST case. 
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rate and that a suitably rescaled version of X s (T n ) is asymptotically nor- 
mal. Moreover, the random variables X s (T n ) and Xt(T n ) are asymptotically 
independent by Theorem 1 if s 7^ t; in particular, the absolute difference 
between the two converges to 00 in probability as n — > 00. In the DST case, 
however, it follows easily from the result mentioned above, that the family 
of distributions of the differences Xt(T n ) — X s (T n ), n € N, is tight. Again on 
the basis of heuristic arguments, I conjecture that the distributional period- 
icities disappear in an appropriately standardized version of the silhouette, 
such as (X t {T n ) - X (T n )) <t<i. 

(ii) In our proofs, we have used martingale results and contraction argu- 
ments. A survey of the contraction method in the context of the analysis of 
algorithms is given in Rosier and Ruschendorf (2001) and, with emphasis on 
the multivariate case, in Neininger and Ruschendorf (2006). A first use of 
the contraction method in connection with the analysis of algorithms on the 
level of stochastic processes, as in the present paper, can be found in Griibel 
and Rosier (1996). Roughly, martingale arguments often provide almost sure 
convergence in cases where the contraction method only yields convergence 
in distribution, but the latter seems to have advantages if our interest is in 
the properties of the limit distribution. The two methods are closely related 
to the complementary aspects of BST sequences, the dynamic structure and 
the distributionally recursive structure, that we have used repeatedly in the 
previous sections. 

(hi) According to Knuth (1997), page 308, trees are "the most important 
nonlinear structures that arise in computer algorithms." Given a sequence 
of input data, both the BST and the DST algorithms generate a sequence 
(T n ) ne pj of binary trees that grow by one node at a time. As has been shown 
in Luczak and Winkler (2004), even in the case of uniformly distributed trees, 
there is a dynamical procedure that builds these structures in this sequential 
manner. From that point of view, in all three cases, the stochastic process 
(T n ) n £f$ of trees is a transient Markov chain with a denumerable state space 
E, with E the set of all finite and prefix-stable subsets of the denumerable 
set N of nodes. One would expect that, in a rough sense, the limit is always 
the complete binary tree T^. However, this is not true in the uniform case; 
see Luczak and Winkler (2004) and the references given therein. However, 
for the search trees that we have considered in the present paper, the fill 
level converges to 00 with probability 1, so a simple compactification E^ := 
EUiT^} makes (T n ) n gN a sequence that converges with probability 1 — if 
convergence means that every u 6 N will eventually be an element of T n . 
From a general theoretical point of view, the results of the present paper 
can be regarded as a first step toward a more detailed asymptotic analysis, 
going beyond the one-point compactification toward classical notions such as 
the Martin boundary; see Sawyer (1997). In this connection, it is interesting 
to note that Regnier's (1989) analysis of Quicksort is based on one specific 
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harmonic function associated with the Markov chain that arises in the BST 
case. 
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